Method and apparatus for constructing a 3D geometry

ABSTRACT

Aspects of the disclosure include methods, apparatuses, and non-transitory computer-readable storage mediums for generating a three-dimensional (3D) geometry of a room from a panorama image of the room. An apparatus includes processing circuitry that determine two-dimensional (2D) positions of wall corner points of the room in the panorama image based on a user input. Each wall corner point is in one of a floor plane or a ceiling plane of the room. The processing circuitry calculates 3D positions of the wall corner points based on the 2D positions of the wall corner points, a size of the panorama image, and a distance between the floor plane and a capture position of a device capturing the panorama image, determines a room layout based on an order of the wall corner points, and generates the 3D geometry based on the room layout and the 3D positions of the wall corner points.

INCORPORATION BY REFERENCE

This present application claims the benefit of priority to U.S.Provisional Application No. 63/185,946, “METHODS OF CONSTRUCTING 3DGEOMETRY FROM PANORAMA IMAGES WITH MARKED CORNERS FOR INDOOR SCENES,”filed on May 7, 2021, which is incorporated by reference herein in itsentirety.

TECHNICAL FIELD

The present disclosure describes embodiments generally related toreconstruction of a three-dimensional space, including for variousvirtual reality and/or augmented reality applications.

BACKGROUND

The background description provided herein is for the purpose ofgenerally presenting the context of the disclosure. Work of thepresently named inventors, to the extent the work is described in thisbackground section, as well as aspects of the description that may nototherwise qualify as prior art at the time of filing, are neitherexpressly nor impliedly admitted as prior art against the presentdisclosure.

A three-dimensional (3D) reconstruction of an indoor building is anactive research topic and has been used in various industries includingreal estate, building construction, building restoration, entertainment,and the like. The 3D reconstruction can leverage technologies such ascomputer vision and machine learning by taking a single image (e.g., RGBimage) or a group of images from different views as an input to generatea 3D geometry representation of the building in a scene. Advances indepth sensors have enabled even more convenient and more accurate waysof measuring depth information from the scene directly. For example,some widely used depth cameras include Lidar, structured light, and thelike.

SUMMARY

Aspects of the disclosure provide apparatuses for generating athree-dimensional (3D) geometry of a room from a panorama image of theroom. An apparatus includes processing circuitry that determinestwo-dimensional (2D) positions of wall corner points of the room in thepanorama image of the room based on a user input. Each of the wallcorner points is in one of a floor plane or a ceiling plane of the room.The processing circuitry calculates 3D positions of the wall cornerpoints based on the 2D positions of the wall corner points, a size ofthe panorama image, and a distance between the floor plane of the roomand a capture position of a device configured to capture the panoramaimage of the room. The processing circuitry determines a layout of theroom based on an order of the wall corner points. The processingcircuitry generates the 3D geometry of the room based on the layout ofthe room and the 3D positions of the wall corner points.

In an embodiment, the user input includes a user selection of the wallcorner points of the room and the order of the wall corner points.

In an embodiment, at least one of the wall corner points is a first typeof wall corner point. The first type of wall corner point indicates awall plane of the 3D geometry.

In an embodiment, at least one of the wall corner points is a secondtype of wall corner point. The second type of wall corner pointindicates an open area plane of the 3D geometry.

In an embodiment, the processing circuitry generates a plane of the 3Dgeometry based on a type of a predetermined one of two adjacent wallcorner points.

In an embodiment, the processing circuitry determines, for each 3Dposition in a plane of the 3D geometry, color information of therespective 3D position based on color information at a 2D position inthe panorama image of the room corresponding to the respective 3Dposition.

In an embodiment, each wall plane of the 3D geometry is parallel ororthogonal to at least one other wall plane of the 3D geometry and theprocessing circuitry generates a guide line that assists a user toselect one of the wall corner points.

In an embodiment, each wall plane of the 3D geometry is parallel ororthogonal to at least one other wall plane of the 3D geometry and theprocessing circuitry adjusts one of the wall corner points that isselected by the user.

In an embodiment, the processing circuitry determines 2D positions oftwo points in the panorama image of the room. The processing circuitrycalculates 3D positions of the two points based on the 2D positions ofthe two points, the size of the panorama image, and the distance betweenthe floor plane of the room and the capture position of the device. Theprocessing circuitry calculates a distance between the 3D positions ofthe two points.

Aspects of the disclosure provide methods for generating a 3D geometryof a room from a panorama image of the room. The methods can perform anyone or a combination of the processes performed by the apparatuses forgenerating the 3D geometry of the room from the panorama image of theroom. In a method, 2D positions of wall corner points of the room in thepanorama image of the room are determined based on a user input. Each ofthe wall corner points is in one of a floor plane or a ceiling plane ofthe room. 3D positions of the wall corner points are calculated based onthe 2D positions of the wall corner points, a size of the panoramaimage, and a distance between the floor plane of the room and a captureposition of a device configured to capture the panorama image of theroom. A layout of the room is generated based on an order of the wallcorner points. The 3D geometry of the room is generated based on thelayout of the room and the 3D positions of the wall corner points.

Aspects of the disclosure also provide non-transitory computer-readablemediums storing instructions which when executed by at least oneprocessor cause the at least one processor to perform any one or acombination of the methods for generating a three-dimensional (3D)geometry of a room from a panorama image of the room.

BRIEF DESCRIPTION OF THE DRAWINGS

Further features, the nature, and various advantages of the disclosedsubject matter will be more apparent from the following detaileddescription and the accompanying drawings in which:

FIG. 1A shows an exemplary panorama image of a room according to anembodiment of the disclosure;

FIG. 1B shows an exemplary 3D geometry of the room according to anembodiment of the disclosure;

FIG. 2 shows an example of measuring a height of a white board in apanorama image of another room according to an embodiment of thedisclosure;

FIG. 3 shows an example of determining an obstructed point in thepanorama image of the other room according to an embodiment of thedisclosure;

FIGS. 4A-4D show various examples of room layouts defined by closuresand control points according to some embodiments of the disclosure;

FIG. 5 shows exemplary guide lines used in a marking process accordingto an embodiment of the disclosure;

FIG. 6 shows an exemplary flowchart according to an embodiment of thedisclosure; and

FIG. 7 is a schematic illustration of a computer system according to anembodiment of the disclosure.

DETAILED DESCRIPTION OF EMBODIMENTS

I. A Three-Dimensional Geometry Construction

This disclosure is related to reconstruction of a three-dimensional (3D)space, such as a room. The room can be in a building for example.Further, the 3D reconstruction can be used in various virtual reality(VR) and/or augmented reality (AR) applications such as virtual tours, adigital museum, and a virtual home sale. In this disclosure, methods ofconstructing a 3D geometry of a room from panorama image(s) withhandcrafted corners for indoor scenes is described as an example.However, it should be understood that the method can be applied to other3D spaces.

A 3D geometry representation of an object is usually in the form of apoint cloud, which contains a set of 3D points in space. Each 3D pointcan include 3D position information and additional attributes such ascolor information and reflectance information. Another popular 3D formatis a textured mesh, which contains connectivity information betweenneighboring points, in addition to 3D point information. Based on theconnectivity information, a collection of facets (e.g., triangles) ofthe textured mesh can be formed. Texture information of the texturedmesh can be also attached to each facet.

In some indoor scene applications, by taking advantages of some priorknowledge, some learning based methods (e.g., LayoutNet algorithm,HorizonNet algorithm, and Dula-Net algorithm) can be used to predictsome room layout elements such as layout boundaries and corner positionsfrom a single panorama image of a room. However, these algorithms aretypically data driven and require high quality training data. Thealgorithms may fail in production scenarios because of the complexity ofreal world scenes.

This disclosure includes methods of reconstructing a 3D geometry of aroom from a single panorama image of the room with hand-marked (e.g.,manually marked by a user), wall corners of the room in the panoramaimage. It is noted that these methods can be applied in a semi-automaticpipeline. For example, an automatic algorithm (e.g., LayoutNetalgorithm, HorizonNet algorithm, or Dula-Net algorithm) can be firstused to generate a rough estimation of the layout corners, and then themethods of this disclosure can be used to refine the layout corners. Inanother example, the layout corners can be hand-marked by using themethods of this disclosure, and then an automatic refinement algorithmcan be applied to the hand-marked corners.

FIG. 1A shows an exemplary panorama image of a room according to anembodiment of the disclosure. In the panorama image, a user can mark oneor more wall corners of the room manually. Based on the marked wallcorners, a 3D geometry of the room can be generated, as shown in FIG.1B. The 3D geometry of the room can be represented in the form of apoint cloud or a textured mesh.

In methods of this disclosure, a panorama image (I) of a room can beused as an input. It is assumed that a size of the panorama image (I) isW×H, where W=2H in pixels. In addition, an accurate value of a cameraheight (e.g., a vertical distance between a center position of a camerato the ground plane) can be provided. The camera height can be providedby a user, estimated by algorithms, measured, or set as a default value(e.g., 1.5 meters). The camera height is denoted as H_(cam).

In methods of this disclosure, certain assumptions can be made regardingsurfaces of the 3D space. For example, it can be assumed that floors ofthe room are flat and horizontal to the ground plane. Ceilings of theroom can be assumed to be flat and horizontal to the ground plane. Wallsof the room can be assumed to be vertical, and thus perpendicular to thefloors. Further, a camera ray can be assumed to be parallel to theground plane.

In methods of this disclosure, in a camera coordinate system, the centerposition of the camera can be used as an origin of the world coordinatesin Cartesian coordinate system, i.e., a coordinate of the camera is (0,0, 0). It can also be assumed that the camera faces towards the positivex axis, the negative z axis is towards the floor plane, and the floorplane and ceiling plane are parallel to the x-y plane. The horizontalvanishing line of the ground plane is at the middle height of thepanorama image of the room. Therefore, the z-axis position of the floorplane is −H_(cam).

With the above assumptions, a coordinate of a 3D position in the roomcan be converted between an image coordinate of a pixel in the panoramaimage corresponding to the 3D position and a Cartesian coordinate of the3D position in a camera coordinate system. It is noted that theconversion equations can vary if the assumptions are different.

According to aspects of the disclosure, an image coordinate of a pixelin the panorama image can be converted to a Cartesian coordinate of a 3Dposition in the camera coordinate system corresponding to the pixel.

In an embodiment, the image coordinate of the pixel in the panoramaimage is (u, v), where u∈[0,W),v∈[0,H). It is assumed that the z-axiscoordinate of the corresponding 3D position in the camera coordinatesystem is known. Thus, the image coordinate of the pixel can beconverted to the Cartesian coordinate of the corresponding 3D positionas follows:

$\begin{matrix}{x = {z \cdot {\tan\left( {v \cdot \frac{\pi}{H}} \right)} \cdot {\sin\left( {u \cdot \frac{2\pi}{W}} \right)}}} & \left( {{Eq}.1} \right)\end{matrix}$ $\begin{matrix}{y = {z \cdot {\tan\left( {v \cdot \frac{\pi}{H}} \right)} \cdot {\cos\left( {u \cdot \frac{2\pi}{W}} \right)}}} & \left( {{Eq}.2} \right)\end{matrix}$

Then, a distance between the corresponding 3D position and the camera inthe real world can be estimated by √{square root over (x²+y²)}.

Therefore, if the pixel is on the floor plane, the Cartesian coordinateof the corresponding 3D position in the camera coordinate system can beexpressed as follows:

$\begin{matrix}{x = {{- H_{cam}} \cdot {\tan\left( {v \cdot \frac{\pi}{H}} \right)} \cdot {\sin\left( {u \cdot \frac{2\pi}{W}} \right)}}} & \left( {{Eq}.3} \right)\end{matrix}$ $\begin{matrix}{y = {{- H_{cam}} \cdot {\tan\left( {v \cdot \frac{\pi}{H}} \right)} \cdot {\cos\left( {u \cdot \frac{2\pi}{W}} \right)}}} & \left( {{Eq}.4} \right)\end{matrix}$ $\begin{matrix}{z = {- H_{cam}}} & \left( {{Eq}.5} \right)\end{matrix}$

If the pixel is on the ceiling plane and the ceiling height, i.e., avertical distance from the ceiling plane to the floor plane, is known asH_(ceil), the Cartesian coordinate of the corresponding 3D position inthe camera coordinate system can be expressed as follows:

$\begin{matrix}{x = {\left( {H_{ceil} - H_{cam}} \right) \cdot {\tan\left( {v \cdot \frac{\pi}{H}} \right)} \cdot {\sin\left( {u \cdot \frac{2\pi}{W}} \right)}}} & \left( {{Eq}.6} \right)\end{matrix}$ $\begin{matrix}{y = {\left( {H_{ceil} - H_{cam}} \right) \cdot {\tan\left( {v \cdot \frac{\pi}{H}} \right)} \cdot {\sin\left( {u \cdot \frac{2\pi}{W}} \right)}}} & \left( {{Eq}.7} \right)\end{matrix}$ $\begin{matrix}{z = {H_{ceil} - H_{cam}}} & \left( {{Eq}.8} \right)\end{matrix}$

According to aspects of the disclosure, a Cartesian coordinate of a 3Dposition in the camera coordinate system can be converted to an imagecoordinate of a pixel in the panorama image corresponding to the 3Dposition.

When the Cartesian coordinate of the 3D position in the cameracoordinate system is (x,y,z), the image coordinate of the correspondingpixel in the panorama image can be expressed as follows:

$\begin{matrix}{u = {\arctan 2{\left( {y,x} \right) \cdot \frac{W}{2\pi}}}} & \left( {{Eq}.9} \right)\end{matrix}$ $\begin{matrix}{v = {\arctan 2{\left( {y,{{\sin\left( {\arctan 2\left( {y,x} \right)} \right)} \cdot z}} \right) \cdot \frac{H}{\pi}}}} & \left( {{Eq}.10} \right)\end{matrix}$where arctan 2( ) is a function defined as follows,

$\begin{matrix}{{\arctan 2\left( {y,x} \right)} = \left\{ \begin{matrix}{2 \cdot {\arctan\left( \frac{y}{x + \sqrt{x^{2} + y^{2}}} \right)}} & {{{if}x} > {0{or}y} \neq 0} \\\pi & {{{{if}x} < {0{and}y}} = 0} \\{undefined} & {{{if}x} = {{0{and}y} = 0}}\end{matrix} \right.} & \left( {{Eq}.11} \right)\end{matrix}$

According to aspects of the disclosure, a height of an object in theroom can be estimated based on the panorama image of the room. Given twopixels at the same column in the panorama image, coordinates of the twopixels in the panorama image are denoted as (u,v₁) and (u,v₂), and thez-axis coordinate of a first 3D position in the camera coordinate systemcorresponding to one pixel in (u,v₁) is known and is equal to z₁, thez-axis coordinate of a second 3D position in camera coordinate systemcorresponding to the other pixel in (u,v₂) can be expressed as follows:

$\begin{matrix}{z_{2} = {z_{1} \cdot {{\tan\left( {v_{1} \cdot \frac{\pi}{H}} \right)}/{\tan\left( {v_{2} \cdot \frac{\pi}{H}} \right)}}}} & \left( {{Eq}.12} \right)\end{matrix}$

One exemplary application is to estimate a height of a point of interestin an object to the floor plane in the panorama image. If the pixel in(u,v₁) is known to be on the floor plane, i.e., z₁=−H_(cam), then thez-axis coordinate of the second 3D position in the camera coordinatesystem corresponding to the other pixel in (u,v₂) can be expressed asfollows:

$\begin{matrix}{z_{2} = {{- H_{cam}} \cdot {{\tan\left( {v_{1} \cdot \frac{\pi}{H}} \right)}/{\tan\left( {v_{2} \cdot \frac{\pi}{H}} \right)}}}} & \left( {{Eq}.13} \right)\end{matrix}$

If the point of interest in the object is the pixel in (u,v₂), then theheight of the point of interest in the object to the floor plane isH_(obj)=z₂−z₁.

FIG. 2 shows an example of measuring a height of a white board in apanorama image of a room according to an embodiment of the disclosure.It is noted that FIG. 2 illustrates a part of the panorama image of theroom. In FIG. 2 , a vertical line (201) is manually drawn from a floorplane of the room to a bottom side of the white board, so the height ofthe white board can be estimated automatically. Through this way, anobject height such as a ceiling height or a desk height can be estimatedtoo. For example, a vertical line (202) shows the ceiling height of theroom.

It is noted that the ceiling height of the room can also be set by auser in some embodiments. Thus, once the ceiling height, i.e., H_(ceil),is determined, a point in a floor plane (or in a ceiling plane) can bedetermined automatically based on a corresponding point in the ceilingplane (or in the floor plane) if the corresponding point is marked by auser. Both points are in the same vertical line or column in thepanorama image of the room.

In an embodiment, if the ceiling height is determined and the point inthe floor plane of the room is determined (e.g., marked by a user), thecorresponding point in the ceiling plane of the room can be determinedautomatically. Both points are in the same vertical line in the panoramaimage of the room.

For example, if a point (u₁,v₁) in a panorama image of a room is in afloor plane of the room, a Cartesian coordinate of the point in thecamera coordinate system, i.e., (x₁,y₁,z₁) can be calculated by (Eq.1)-(Eq. 2), where z₁=−H_(cam). Then, a Cartesian coordinate of acorresponding point in a ceiling plane of the room can be calculated as(x₂,y₂,z₂)=(x₁,y₁,z₁+H_(ceil)). Then, an image coordinate of a pixelcorresponding to the point in the ceiling plane can be calculated by(Eq. 9)-(Eq. 10).

In an embodiment, if the ceiling height is determined and the point inthe ceiling plane of the room is determined (e.g., marked by a user),the corresponding point in the floor plane of the room can be determinedautomatically. Both points are in the same vertical line in the panoramaimage of the room.

For example, if the point (u₁,v₁) in the panorama image of the room isin the ceiling plane, the Cartesian coordinate of the point in thecamera coordinate system, i.e., (x₁,y₁,z₁) can be calculated by (Eq.1)-(Eq. 2), where z₁=H_(ceil)−H_(cam). Then, the Cartesian coordinate ofthe corresponding point in the floor plane can be calculated as(x₂,y₂,z₂)=(x₁,y₁,z₁−H_(ceil)). Then, the image coordinate of the pointin the floor plane can be calculated by (Eq. 9)-(Eq. 10).

The above methods can be important because in real scenes it is commonthat either the point in the floor plane or the point in the ceilingplane point is obstructed by an object. In this case, a user can clickon a visible point in either the floor plane or the ceiling plane, andthe occluded counterpart can be estimated automatically in the panoramaimage. That is, a corner of a wall can be determined if at least one ofthe point in the floor plane or the point in the ceiling plane point ismanually marked.

FIG. 3 shows an example of determining an obstructed point in a panoramaimage of a room. For example, for corner point pairs marked with number“01”, “02” and “03”, the corner points in a floor plane of the room areobstructed by some chairs and a desk in the room. However, these cornerpoints can still be estimated by clicking the corresponding points in aceiling plane of the room.

A pair of ceiling and floor corner points that are in the same verticalline of a panorama image of a room can define a vertical straight linein a 3D space. Two pairs of corner points can define a vertical wallplane in the 3D space. For example, in FIG. 3 , the corner point pairs“00” and “01” define a wall plane with the white board, the corner pointpairs “01” and “02” define a wall plane with the window, the cornerpoint pairs “02” and “03” define a wall plane with the TV, and thecorner point pairs “03” and “04” define a wall plane with the glassdoor. It is noted that the corner point pairs “04” and “00” define anopened door instead of a solid wall plane.

According to aspects of the disclosure, a room layout can be defined bycontrol points. The room layout can be a polygon shape area and includemultiple corner points. Some adjacent corner points can form a wallplane, while others can form an opening area. Thus, a closure conceptand two types of control points can be described as follows.

A closure can be defined as a set of control points with a certainorder. In FIG. 3 , the corner point pairs from “00” to “04” form aclosure. FIG. 4A shows another example of a closure according to anembodiment of the disclosure. In FIG. 4A, control points from “00” to“03” form a closure. It is noted that the control points in FIG. 4 areillustrated in two-dimensional (2D) with a bird-eye view of the x-yplane in the camera coordinate system. A control point in FIG. 4Acorresponds to a corner point pair (e.g., one corner point is in aceiling plane and the other corner point is in a floor plane) in realscenes. Since these two corner points have the same x and y coordinatesin the camera coordinate system, they are represented by the samecontrol point in FIG. 4A. To determine a 2D position of a control point,a user can mark a corresponding corner point in either the floor planeor the ceiling plane in the panorama image of the room. It is noted thata closure is a loop and determined based on an order of control points.One control point is connected with its neighbors according to theorder, and the last control point is connected with the first controlpoint. For example, “00” and “04” are connected in the closure in FIG. 3.

In some embodiments, different types of wall planes can be distinguishedusing different types of control points. For example, to distinguishbetween a solid wall plane and an opening area, two types of controlpoints can be defined. As shown in FIGS. 4A-4D, two types of controlpoints can be represented by solid and patterned circles, respectively.A solid control point can form a solid wall plane with its previouscontrol point, while a patterned control point can form an opening areawith its previous control point. For example, all four control points inFIG. 4A are solid control points, thus this closure has four solid wallsin a space. In FIG. 4B, however, “00” is a patterned control point, thusa segment from “04” to “00” indicates an opening area (e.g., an openeddoor or window). In FIG. 4C and FIG. 4D, there are two patterned controlpoints and two solid control points, and thus there are two solid wallsand two opening areas (e.g., a balcony and a corridor).

It is noted that a closure is not limited to construction of a solidwall plane based on solid control points, but can also construct a floorplane and/or a ceiling plane. A polygon defined by all control pointscan identify a shape of the floor plane and/or the ceiling plane. Thus,patterned control points can be as important as solid control points.

In one embodiment, a computer software (or program) can distinguishbetween the two types of control points by different input types, suchas a left click or a right click of a computer mouse.

In one embodiment, the computer software (or program) can distinguishthe two types of control points with the aid of a keyboard, such asdifferent keyboard inputs.

In some embodiments, more than one closure can be used to describe aroom layout of the scene in a single panorama image. The multipleclosures are independent from each other and can have different ceilingheights. Therefore, the multiple closures can be used to represent ascene of multiple rooms with different ceiling heights.

Once the room layout is defined by the closure(s) and the controlpoints, the 3D geometry of the scene can be recovered by constructingthe wall planes, the floor plane(s), and the ceiling plane(s). The 3Dgeometry positions of the control points can be calculated based on (Eq.1)-(Eq. 8). Then, 3D positions in each plane (e.g., a wall plane, aceiling plane, a floor plane, or another plane) can be calculated byinterpolation from the control points. The image coordinate of eachinterpolated position can be calculated based on (Eq. 9)-(Eq. 10), andcolor information at the image coordinate of each interpolated positionin the panorama image can be applied as color information at therespective interpolated position in the 3D geometry. Thus, a coloredpoint cloud or a textured mesh of the scene can be constructed.

In some embodiments, certain assumptions regarding the arrangement ofthe walls can be made. For example, Manhattan world assumption can beused in generating a 3D geometry of a 3D space to improve the quality ofthe generated 3D geometry. In the Manhattan world assumption, it isassumed that the walls are either parallel or orthogonal to each other.

In one embodiment, the Manhattan world assumption can be used to guide amarking process of a user. FIG. 5 shows the exemplary display of guidelines that can be used in the marking process according to an embodimentof the disclosure. In FIG. 5 , as a solid wall plane is formed after thecontrol points “00” and “01” are marked, a major direction in theManhattan world can be determined. Then, when a user moves a cursor tofind the next control point “02”, a set of guide lines can be generatedto help the user find it quickly. In FIG. 5 , the curves (501)-(504)stretched out from the next control point “02” are preview guide lines.These curves depict two orthogonal wall planes that intersect at the“02” position in the panorama image. Through this way, a degree ofconvenience and an accuracy of marking the control point “02” can beimproved even if the floor and ceiling corner points are bothobstructed.

In one embodiment, the Manhattan world assumption can be used to refinea user-marked position. For example, if a user-marked position in thepanorama image is close to but not exactly on a guide line, theuser-marked position can be slightly adjusted to be consistent with theManhattan world assumption. Through this way, it is guaranteed that thegenerated 3D geometry can satisfy the Manhattan world assumption, andthus yielding a more accurate construction.

According to aspects of the disclosure, a 3D geometry (e.g., a pointcloud or a textured mesh) can be constructed by marking a set of wallcorner points in a ceiling plane and/or a floor plane in a panoramaimage of a room. The wall corner points can be marked by a user manuallyin an embodiment. In another embodiment, the wall corner points can befirst estimated, such as by an automatic algorithm (e.g., LayoutNetalgorithm, HorizonNet algorithm, or Dula-Net algorithm), and thenmodified by a user. In another embodiment, the wall corner points can befirst marked by a user and then refined, such as by an automaticalgorithm.

In some embodiments, a distance (or a dimension) in the real-world canbe measured or estimated based on marked points in the panorama image ofthe room. The distance in the real-world is from a position in thereal-world corresponding to a marked point in the panorama image to acamera device that is configured to capture the image. A 3D coordinateof the position in the real-world can be located in a camera coordinatesystem in which the camera device can be located at (0, 0, 0) forexample.

In one embodiment, an object height (e.g., a vertical distance from theobject to a floor plane of the room) can be estimated by marking a pointof the object in the panorama image of the room. When the object heightis known (or estimated) or otherwise determined, a 3D coordinate of apoint in the real-world corresponding to an arbitrary point in thepanorama image can be estimated. The 3D coordinate of the position inthe real-world can be located in the camera coordinate system in whichthe camera is located at (0, 0, 0) for example. In addition, when theobject height is known (or estimated) or otherwise determined, adistance between two positions in the real-world can be estimated. Thetwo positions can correspond to two arbitrary points in the panoramaimage of the room.

In some embodiments, a room layout can be defined by a group of markedpoints (also referred to as control points) in the panorama image. Themarked points in a specific order can define a closure that correspondsto the room layout. Two types of marked points can be used, in which afirst type of marked points can form a solid wall plane of the room anda second type of marked points can form an opening area of the room.

As noted above, in one embodiment, a computer software (or program) candistinguish the two types of control points by a left click or a rightclick of a computer mouse. In one embodiment, the computer software (orprogram) can distinguish the two types of control points with the aid ofa keyboard.

For a single panorama image, more than one closure can be used todescribe the room layout of the scene. The multiple closures areindependent from each other and can have different ceiling heights.Therefore, the multiple closures can be used to represent a scene ofmultiple rooms with different ceiling heights.

Based on the room layout, a 3D geometry representation of the scene canbe constructed. The 3D geometry can be either a point cloud with colorand normal vector information, or a mesh with texture information. Thecolor and texture information can be obtained by interpolation from thepanorama image.

In some embodiments, certain assumptions such as the Manhattan worldassumption can be applied in constructing the 3D geometry of the scene.

In one embodiment, the Manhattan world assumption can be used to guide amarking process. For example, a set of guide lines can be generated inthe panorama image based on the Manhattan world assumption. In themarking process, a user can follow the set of guide lines to mark pointsin the panorama image.

In one embodiment, the Manhattan world assumption can be used to refinea user-marked position. For example, if the user-marked position is noton any of the set of guide lines, the user-marked position can beautomatically adjusted to be on one closest guide line.

II. Flowchart

FIG. 6 shows a flow chart outlining an exemplary process (600) accordingto an embodiment of the disclosure. In various embodiments, the process(600) is executed by processing circuitry, such as the processingcircuitry shown in FIG. 7 . In some embodiments, the process (600) isimplemented in software instructions, thus when the processing circuitryexecutes the software instructions, the processing circuitry performsthe process (600).

The process (600) may generally start at step (S610), where the process(600) determines 2D positions of wall corner points of the room in thepanorama image of the room based on a user input. Each of the wallcorner points is in one of a floor plane or a ceiling plane of the room.Then, the process (600) proceeds to step (S620).

At step (S620), the process (600) calculates 3D positions of the wallcorner points based on the 2D positions of the wall corner points, asize of the panorama image, and a distance between the floor plane ofthe room and a capture position of a device configured to capture thepanorama image of the room. Then, the process (600) proceeds to step(S630).

At step (S630), the process (600) determines a layout of the room basedon an order of the wall corner points. Then, the process (600) proceedsto step (S640).

At step (S640), the process (600) generates the 3D geometry of the roombased on the layout of the room and the 3D positions of the wall cornerpoints. Then, the process (600) terminates.

In an embodiment, the user input includes a user selection of the wallcorner points of the room and the order of the wall corner points.

In an embodiment, at least one of the wall corner points is a first typeof wall corner point. The first type of wall corner point indicates awall plane of the 3D geometry.

In an embodiment, at least one of the wall corner points is a secondtype of wall corner point. The second type of wall corner pointindicates an open area plane of the 3D geometry.

In an embodiment, the processing circuitry generates a plane of the 3Dgeometry based on a type of a predetermined one of two adjacent wallcorner points.

In an embodiment, the process (600) determines, for each 3D position ina plane of the 3D geometry, color information of the respective 3Dposition based on color information at a 2D position in the panoramaimage of the room corresponding to the respective 3D position.

In an embodiment, each wall plane of the 3D geometry is parallel ororthogonal to at least one other wall plane of the 3D geometry, and theprocess (600) generates a guide line that assists a user to select oneof the wall corner points.

In an embodiment, each wall plane of the 3D geometry is parallel ororthogonal to at least one other wall plane of the 3D geometry, and theprocess (600) adjusts one of the wall corner points that is selected bythe user.

In an embodiment, the process (600) determines 2D positions of twopoints in the panorama image of the room. The process (600) calculates3D positions of the two points based on the 2D positions of the twopoints, the size of the panorama image, and the distance between thefloor plane of the room and the capture position of the device. Theprocess (600) calculates a distance between the 3D positions of the twopoints.

III. Computer System

The techniques described above, can be implemented as computer softwareusing computer-readable instructions and physically stored in one ormore computer-readable media. For example, FIG. 7 shows a computersystem (700) suitable for implementing certain embodiments of thedisclosed subject matter.

The computer software can be coded using any suitable machine code orcomputer language, that may be subject to assembly, compilation,linking, or like mechanisms to create code comprising instructions thatcan be executed directly, or through interpretation, micro-codeexecution, and the like, by one or more computer central processingunits (CPUs), Graphics Processing Units (GPUs), and the like.

The instructions can be executed on various types of computers orcomponents thereof, including, for example, personal computers, tabletcomputers, servers, smartphones, gaming devices, internet of thingsdevices, and the like.

The components shown in FIG. 7 for computer system (700) are exemplaryin nature and are not intended to suggest any limitation as to the scopeof use or functionality of the computer software implementingembodiments of the present disclosure. Neither should the configurationof components be interpreted as having any dependency or requirementrelating to any one or combination of components illustrated in theexemplary embodiment of a computer system (700).

Computer system (700) may include certain human interface input devices.Such a human interface input device may be responsive to input by one ormore human users through, for example, tactile input (such as:keystrokes, swipes, data glove movements), audio input (such as: voice,clapping), visual input (such as: gestures), olfactory input (notdepicted). The human interface devices can also be used to capturecertain media not necessarily directly related to conscious input by ahuman, such as audio (such as: speech, music, ambient sound), images(such as: scanned images, photographic images obtain from a still imagecamera), video (such as two-dimensional video, three-dimensional videoincluding stereoscopic video).

Input human interface devices may include one or more of (only one ofeach depicted): keyboard (701), mouse (702), trackpad (703), touchscreen (710), data-glove (not shown), joystick (705), microphone (706),scanner (707), and camera (708).

Computer system (700) may also include certain human interface outputdevices. Such human interface output devices may be stimulating thesenses of one or more human users through, for example, tactile output,sound, light, and smell/taste. Such human interface output devices mayinclude tactile output devices (for example tactile feedback by thetouch-screen (710), data-glove (not shown), or joystick (705), but therecan also be tactile feedback devices that do not serve as inputdevices), audio output devices (such as: speakers (709), headphones (notdepicted)), visual output devices (such as screens (710) to include CRTscreens, LCD screens, plasma screens, OLED screens, each with or withouttouch-screen input capability, each with or without tactile feedbackcapability-some of which may be capable to output two dimensional visualoutput or more than three dimensional output through means such asstereographic output; virtual-reality glasses (not depicted),holographic displays and smoke tanks (not depicted)), and printers (notdepicted). These visual output devices (such as screens (710)) can beconnected to a system bus (748) through a graphics adapter (750).

Computer system (700) can also include human accessible storage devicesand their associated media such as optical media including CD/DVD ROM/RW(720) with CD/DVD or the like media (721), thumb-drive (722), removablehard drive or solid state drive (723), legacy magnetic media such astape and floppy disc (not depicted), specialized ROM/ASIC/PLD baseddevices such as security dongles (not depicted), and the like.

Those skilled in the art should also understand that term “computerreadable media” as used in connection with the presently disclosedsubject matter does not encompass transmission media, carrier waves, orother transitory signals.

Computer system (700) can also include a network interface (754) to oneor more communication networks (755). The one or more communicationnetworks (755) can for example be wireless, wireline, optical. The oneor more communication networks (755) can further be local, wide-area,metropolitan, vehicular and industrial, real-time, delay-tolerant, andso on. Examples of the one or more communication networks (755) includelocal area networks such as Ethernet, wireless LANs, cellular networksto include GSM, 3G, 4G, 5G, LTE and the like, TV wireline or wirelesswide area digital networks to include cable TV, satellite TV, andterrestrial broadcast TV, vehicular and industrial to include CANBus,and so forth. Certain networks commonly require external networkinterface adapters that attached to certain general purpose data portsor peripheral buses (749) (such as, for example USB ports of thecomputer system (700)); others are commonly integrated into the core ofthe computer system (700) by attachment to a system bus as describedbelow (for example Ethernet interface into a PC computer system orcellular network interface into a smartphone computer system). Using anyof these networks, computer system (700) can communicate with otherentities. Such communication can be uni-directional, receive only (forexample, broadcast TV), uni-directional send-only (for example CANbus tocertain CANbus devices), or bi-directional, for example to othercomputer systems using local or wide area digital networks. Certainprotocols and protocol stacks can be used on each of those networks andnetwork interfaces as described above.

Aforementioned human interface devices, human-accessible storagedevices, and network interfaces can be attached to a core (740) of thecomputer system (700).

The core (740) can include one or more Central Processing Units (CPU)(741), Graphics Processing Units (GPU) (742), specialized programmableprocessing units in the form of Field Programmable Gate Areas (FPGA)(743), hardware accelerators for certain tasks (744), graphics adapters(750), and so forth. These devices, along with Read-only memory (ROM)(745), Random-access memory (746), internal mass storage (747) such asinternal non-user accessible hard drives, SSDs, and the like, may beconnected through the system bus (748). In some computer systems, thesystem bus (748) can be accessible in the form of one or more physicalplugs to enable extensions by additional CPUs, GPU, and the like. Theperipheral devices can be attached either directly to the core's systembus (748), or through a peripheral bus (749). In an example, the screen(710) can be connected to the graphics adapter (750). Architectures fora peripheral bus include PCI, USB, and the like.

CPUs (741), GPUs (742), FPGAs (743), and accelerators (744) can executecertain instructions that, in combination, can make up theaforementioned computer code. That computer code can be stored in ROM(745) or RAM (746). Transitional data can be also be stored in RAM(746), whereas permanent data can be stored for example, in the internalmass storage (747). Fast storage and retrieve to any of the memorydevices can be enabled through the use of cache memory, that can beclosely associated with one or more CPU (741), GPU (742), mass storage(747), ROM (745), RAM (746), and the like.

The computer readable media can have computer code thereon forperforming various computer-implemented operations. The media andcomputer code can be those specially designed and constructed for thepurposes of the present disclosure, or they can be of the kind wellknown and available to those having skill in the computer software arts.

As an example and not by way of limitation, the computer system havingarchitecture (700) and specifically the core (740) can providefunctionality as a result of processor(s) (including CPUs, GPUs, FPGA,accelerators, and the like) executing software embodied in one or moretangible, computer-readable media. Such computer-readable media can bemedia associated with user-accessible mass storage as introduced above,as well as certain storage of the core (740) that are of non-transitorynature, such as core-internal mass storage (747) or ROM (745). Thesoftware implementing various embodiments of the present disclosure canbe stored in such devices and executed by core (740). Acomputer-readable medium can include one or more memory devices orchips, according to particular needs. The software can cause the core(740) and specifically the processors therein (including CPU, GPU, FPGA,and the like) to execute particular processes or particular parts ofparticular processes described herein, including defining datastructures stored in RAM (746) and modifying such data structuresaccording to the processes defined by the software. In addition or as analternative, the computer system can provide functionality as a resultof logic hardwired or otherwise embodied in a circuit (for example:accelerator (744)), which can operate in place of or together withsoftware to execute particular processes or particular parts ofparticular processes described herein. Reference to software canencompass logic, and vice versa, where appropriate. Reference to acomputer-readable media can encompass a circuit (such as an integratedcircuit (IC)) storing software for execution, a circuit embodying logicfor execution, or both, where appropriate. The present disclosureencompasses any suitable combination of hardware and software.

While this disclosure has described several exemplary embodiments, thereare alterations, permutations, and various substitute equivalents, whichfall within the scope of the disclosure. It will thus be appreciatedthat those skilled in the art will be able to devise numerous systemsand methods which, although not explicitly shown or described herein,embody the principles of the disclosure and are thus within the spiritand scope thereof.

What is claimed is:
 1. A method of generating a three-dimensional (3D)representation of a room from a panorama image of the room, the methodcomprising: determining two-dimensional (2D) positions of wall cornerpoints of the room in the panorama image of the room based on a userselection of the wall corner points captured in the panorama image, eachof the wall corner points being in one of a floor plane or a ceilingplane of the room; calculating 3D positions of the wall corner pointsbased on the 2D positions of the wall corner points, a size of thepanorama image, and a distance between the floor plane of the room and acapture position of a device configured to capture the panorama image ofthe room; determining a layout of the room based on an order of the wallcorner points; and generating the 3D representation of the room based onthe layout of the room and the 3D positions of the wall corner points.2. The method of claim 1, wherein the determining the 2D positions ofthe wall corner points comprises: determining the 2D positions of thewall corner points of the room in the panorama image of the room basedon an order in which the wall corner points are selected.
 3. The methodof claim 1, wherein at least one of the wall corner points is a firsttype of wall corner point, the first type of wall corner pointindicating a wall plane of the 3D representation.
 4. The method of claim1, wherein at least one of the wall corner points is a second type ofwall corner point, the second type of wall corner point indicating anopen area plane of the 3D representation.
 5. The method of claim 1,wherein the generating comprises: generating a plane of the 3Drepresentation based on a type of a predetermined one of two adjacentwall corner points.
 6. The method of claim 1, further comprising:determining, for each 3D position in a plane of the 3D representation,color information of the respective 3D position based on colorinformation at a 2D position in the panorama image of the roomcorresponding to the respective 3D position.
 7. The method of claim 1,wherein each wall plane of the 3D representation is parallel ororthogonal to at least one other wall plane of the 3D representation,and the method further includes generating a guide line that assists auser to select one of the wall corner points.
 8. The method of claim 1,wherein each wall plane of the 3D representation is parallel ororthogonal to at least one other wall plane of the 3D representation,and the method further includes adjusting one of the wall corner pointsthat is selected by the user.
 9. The method of claim 1, furthercomprising: determining 2D positions of two points in the panorama imageof the room; calculating 3D positions of the two points based on the 2Dpositions of the two points, the size of the panorama image, and thedistance between the floor plane of the room and the capture position ofthe device; and calculating a distance between the 3D positions of thetwo points.
 10. An apparatus, comprising: processing circuitryconfigured to: determine two-dimensional (2D) positions of wall cornerpoints of a room in a panorama image of the room based on a userselection of the wall corner points captured in the panorama image, eachof the wall corner points being in one of a floor plane or a ceilingplane of the room; calculate three-dimensional (3D) positions of thewall corner points based on the 2D positions of the wall corner points,a size of the panorama image, and a distance between the floor plane ofthe room and a capture position of a device configured to capture thepanorama image of the room; determine a layout of the room based on anorder of the wall corner points; and generate a 3D representation of theroom based on the layout of the room and the 3D positions of the wallcorner points.
 11. The apparatus of claim 10, wherein the processingcircuitry is configured to: determine the 2D positions of the wallcorner points of the room in the panorama image of the room based on anorder in which the wall corner points are selected.
 12. The apparatus ofclaim 10, wherein at least one of the wall corner points is a first typeof wall corner point, the first type of wall corner point indicating awall plane of the 3D representation.
 13. The apparatus of claim 10,wherein at least one of the wall corner points is a second type of wallcorner point, the second type of wall corner point indicating an openarea plane of the 3D representation.
 14. The apparatus of claim 10,wherein the processing circuitry is further configured to: generate aplane of the 3D representation based on a type of a predetermined one oftwo adjacent wall corner points.
 15. The apparatus of claim 10, whereinthe processing circuitry is further configured to: determine, for each3D position in a plane of the 3D representation, color information ofthe respective 3D position based on color information at a 2D positionin the panorama image of the room corresponding to the respective 3Dposition.
 16. The apparatus of claim 10, wherein each wall plane of the3D representation is parallel or orthogonal to at least one other wallplane of the 3D representation, and the processing circuitry is furtherconfigured to: generate a guide line that assists a user to select oneof the wall corner points.
 17. The apparatus of claim 10, wherein eachwall plane of the 3D representation is parallel or orthogonal to atleast one other wall plane of the 3D representation, and the processingcircuitry is further configured to: adjust one of the wall corner pointsthat is selected by the user.
 18. The apparatus of claim 10, wherein theprocessing circuitry is further configured to: determine 2D positions oftwo points in the panorama image of the room; calculate 3D positions ofthe two points based on the 2D positions of the two points, the size ofthe panorama image, and the distance between the floor plane of the roomand the capture position of the device; and calculate a distance betweenthe 3D positions of the two points.
 19. A non-transitorycomputer-readable storage medium storing instructions which, whenexecuted by at least one processor, cause the at least one processor toperform: determining two-dimensional (2D) positions of wall cornerpoints of a room in a panorama image of the room based on a userselection of the wall corner points captured in the panorama image, eachof the wall corner points being in one of a floor plane or a ceilingplane of the room; calculating three-dimensional (3D) positions of thewall corner points based on the 2D positions of the wall corner points,a size of the panorama image, and a distance between the floor plane ofthe room and a capture position of a device configured to capture thepanorama image of the room; determining a layout of the room based on anorder of the wall corner points; and generating a 3D representation ofthe room based on the layout of the room and the 3D positions of thewall corner points.
 20. The non-transitory computer-readable storagemedium of claim 19, wherein the determining the 2D positions of the wallcorner points comprises: determining the 2D positions of the wall cornerpoints of the room in the panorama image of the room based on an orderin which the wall corner points are selected.